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1  Summary 

The  purpose  of  this  in-house  research  project  was  to  investigate  supervised  and  semi-supervised 
machine  learning  techniques  for  classification  and  pattern  recognition  by  exploiting  the  nat¬ 
ural  sparsity  in  signals  and  through  data  dimension  reduction,  and  to  develop  and  tailor 
algorithms  for  the  extraction  of  intelligence  from  several  huge  heterogeneous  data  sets.  The 
research  provides  a  mathematically  rigorous  foundation  for  models  and  algorithms  that  could 
be  applied  toward  technologies  in  the  areas  of  autonomy,  trusted  systems,  situational  aware¬ 
ness  and  machine  guided  data-to-decision  processes. 

This  report  describes  the  delivered  results  for  three  related  research  areas.  First,  an  algo¬ 
rithm  to  approximate  the  Dantzig  selector  is  described.  The  Dantzig  selector  is  a  method  to 
approximate  a  sparse  vector  that  captures  the  most  essential  information  in  a  large  amount 
of  data.  The  algorithm,  which  was  developed  in  the  course  of  this  research  effort,  is  an 
iterative  approach  based  upon  solutions  to  a  pair  of  proximity  operator  equations.  The 
algorithm  is  an  improvement  over  current  state-of-the-art  methods  in  that  it  produces  re¬ 
sults  of  similar  quality,  but  tends  to  converge  significantly  faster.  Next,  an  i\  minimization 
model  is  extended  to  incorporate  overcomplete  dictionaries.  The  extension  allows  one  to 
obtain  a  sparse  representation  of  homogeneous  and  heterogeneous  data,  which  in  turn  is 
used  to  improve  classification  and  pattern  recognition  using  the  sparse  coefficient  vectors. 
Additionally  the  proposed  method  is  demonstrated  to  separate  composite  signals  using  a 
supervised  machine  learning  technique.  Finally,  an  unique  method  to  perform  rotational 
invariant  pattern  recognition  is  described.  The  method  is  based  upon  an  efficient  strategy 
for  approximating  the  Gaussian-Hermite  moments  of  a  function  using  a  collocation-based 
optimization  approach.  The  method  described  herein  is  an  improvement  in  accuracy  and 
complexity  over  the  commonly  used  brute-force  approaches. 

2  Introduction 

Machine  learning  is  the  field  concerning  the  conversion  of  data  into  usable  information  by 
a  computer  through  the  discovery  of  patterns  and  trends  that  are  present  in  the  data,  but 
are  typically  difficult  for  a  human  to  discern  clue  to  the  sheer  mass  of  the  data  and  nuanced 
interactions  between  variables  in  the  feature  space.  A  general  machine  learning  model  seeks 
to  label  the  high  dimensional  input  data  with  the  appropriate  output  classifiers.  That  is, 
if  X  =  {xi,X2, . . .  |xj  G  is  a  collection  of  d-dimensional  real- valued  input  vectors  and 
Y  =  {2/1, 2/2,  •  •  •  bVkllJi  G  M}  is  a  collection  of  real-valued  output  labels,  a  machine  learning 
process  attempts  to  find  a  well-defined  function  /  :  X  — >■  Y  that  accurately  matches  each  x?; 
with  its  appropriate  label  /(xj)  =  yr  Machine  learning  techniques  can  follow  the  supervised, 
unsupervised  or  semi- supervised  paradigms. 

Supervised  machine  learning  techniques  are  used  when  the  user  has  some  known  ground 
truth  pairs  (x,  y )  available.  The  set  of  vectors  with  known  classifiers  is  called  the  training 
set,  and  this  knowledge  is  exploited  to  intelligently  extrapolate  function  pairs  /(x)  =  y 
for  vectors  x  that  are  not  encountered  in  the  training  set.  Some  specific  approaches  that 
are  categorized  as  supervised  learning  techniques  include  support  vector  machines,  decision 
trees,  and  the  training  of  artificial  neural  networks,  where  each  approach  admits  a  number 
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of  specific  algorithms.  A  standard  example  of  supervised  learning  techniques  is  OCR  (opti¬ 
cal  character  recognition),  which  was  popularized  by  the  United  States  Postal  Service,  but 
supervised  learning  techniques  have  applications  in  any  general  pattern  recognition  environ¬ 
ment  where  a  known  training  set  is  available.  A  properly  sampled  training  set  can  prepare 
the  system  for  success,  however  reliance  on  a  training  set  that  does  not  accurately  capture 
the  array  of  data  one  might  encounter  can  lead  to  misclassihcation  and  unreliable  results. 
Additionally,  the  algorithms  used  in  supervised  learning  work  best  for  data  that  can  be  ex¬ 
pressed  in  a  features  space  of  relative  small  effective  dimension  with  linear  relations  among 
the  features. 

On  the  other  hand,  unsupervised  machine  learning  techniques  classify  the  input  data 
according  to  similar  structures  that  reveal  themselves  through  mathematical  data  dimension 
reduction  and  feature  extraction.  These  techniques  are  employed  when  no  training  set  is 
available,  and  therefore  also  no  target  output  attributes  are  known.  Unsupervised  learning 
algorithms  cluster  the  data  into  sets  that  exhibit  similar  properties.  Common  approaches  to 
unsupervised  machine  learning  include  k-means  clustering  algorithms,  and  can  be  achieved 
through  a  variety  of  methods  including  PCA  (principal  component  analysis)  and  SVD  (sin¬ 
gular  value  decomposition)  techniques.  The  absence  of  a  training  set  and  target  classifier 
can  offer  advantages  over  supervised  learning;  allowing  the  data  to  reveal  their  own  corre¬ 
lations  instead  of  having  user-imposed  restrictions  on  classification  labels  can  provide  richer 
intelligence  from  the  information. 

The  following  described  research  draws  from  supervised  and  unsupervised  machine  learn¬ 
ing  paradigms,  with  an  emphasis  on  reducing  the  effective  dimension  of  the  data  by  finding 
accurate  sparse  representations  of  the  data. 

3  Methods,  Assumptions,  and  Procedures 

The  objective  of  this  research  is  to  investigate  machine  learning  techniques  for  classification 
and  pattern  recognition,  and  to  develop  and  tailor  adaptive  algorithms  for  application  to 
huge,  heterogeneous  data  sets  enabling  the  extraction  of  intelligence  from  information. 

The  approach  uses  two-scale  supervised  and  unsupervised  machine  learning  techniques 
to  discover  inter-  and  intra-set  relationships  and  to  reduce  the  dimensionality  of  the  data. 

Unsupervised  data  dimension  reduction  through  the  use  of  t\  norm  minimization  and 
PCA  will  be  performed  within  each  data  set  and  across  multiple  sets  to  harvest  the  most 
signification  features  while  suppressing  spurious  information.  The  techniques  used  ensures 
minimal  redundancy  of  the  representation  of  the  data,  and  allows  one  to  discover  signifi¬ 
cant  interactions  among  the  relevant  features  in  each  set  and  to  enable  better  situational 
awareness. 

For  the  supervised  learning  aspect  of  the  research,  we  will  begin  with  a  large  collection 
of  known  information,  and  analyze  the  characterizing  relationship  among  the  data  in  each 
class.  This  will  lead  to  a  sufficiently  large  set  of  ground  truth,  as  well  as  giving  guidance 
on  the  number  of  classes  to  be  used,  and  the  correct  classification  function  /.  These  cannot 
be  known  a  priori  -  access  to  initial  information  is  fundamental  in  developing  a  supervised 
learning  scheme.  However,  if  the  relationships  in  the  data  tend  to  be  described  well  by  a 
linear  classifier,  the  PCA  technique  can  be  used  again  in  the  discovery  of  /.  For  example, 
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if  the  principle  component  of  x  is  p,  and  x  is  known  to  belong  to  class  y,  then  any  newly 
encountered  data  with  principle  component  p  shall  also  be  assigned  to  class  y. 

The  algorithms  will  be  developed  and  implemented  using  MATLAB.  Speed  and  memory 
usage  may  be  improved  by  converting  the  algorithms  to  C. 

4  Results  and  Discussion 

Several  new  methods  and  algorithms  were  developed  during  the  course  of  this  research  effort, 
including  an  algorithm  to  quickly  and  accurately  approximate  the  Dantzig  selector  [6],  a 
scheme  to  separate  and  classify  undersampled  composite  data  [7],  and  a  novel  method  to 
perform  rotational  invariant  pattern  recognition  in  images  [8].  These  three  major  research 
products  have  been  summarized  in  journal  publications  and  conference  proceedings.  To 
illustrate  and  verify  the  theoretical  results  of  the  main  research  products  listed  above,  several 
interesting  numerical  experiments  were  performed  using  real-world  and  simulated  data  with 
large,  homogeneous  and  heterogeneous  data  sets.  Each  is  explained  in  more  detail  below. 

4.1  Algorithm  to  Approximate  the  Dantzig  Selector 

The  Dantzig  selector  is  a  solution  to  the  optimization  problem 

P  e  argmin  {II/3H!  :  \\D~1XT(Xp  -  y)||oo  <  5}  ,  (1) 

P 

where  y  is  the  observed  data,  X  is  a  known  data  matrix  satisfying  certain  properties  [4],  D 
is  a  diagonal  matrix  normalizing  the  columns  of  X,  and  5  is  a  small,  user-chosen  parameter. 
Typically  the  dimensionality  of  [3  is  much  larger  than  that  of  y,  however  since  the  solution 
to  Equation  (1)  tends  to  be  sparse,  the  Dantzig  selector  has  a  much  lower  effective  dimen¬ 
sionality  that  the  original  data.  Therefore  the  Dantzig  selector  is  an  appropriate  feature  to 
use  for  data  dimension  reduction  to  assist  machine  learning  and  classification  tasks. 

Several  methods  exist  to  compute  a  Dantzig  selector,  including  a  primal-dual  interior 
point  method  [3,  4],  a  first-order  method  based  upon  linear  cone  programming  [1,  2],  and  an 
alternating  direction  method  [5].  Each  has  its  own  strength  and  weaknesses.  For  example,  the 
Alternating  Direction  Method  of  Multipliers  is  an  iterative  approach  that  tends  to  converge 
to  a  solution  of  (1)  in  few  iterations,  however  the  total  computational  cost  of  this  method  is 
large  since  each  step  in  the  iteration  requires  the  solution  of  another  optimization  problem 
via  an  iterative  approach.  To  solve  the  Dantzig  selector  problem,  Lixin  Shen  (SU),  Bruce 
Suter  (AFRL/RITB)  and  Ashley  Prater  (AFRL/RITB)  developed  a  two-stage  approach. 
The  first  stage  approximates  /3  as  the  fixed  point  solution  to  a  pair  of  iterative  proximal 
equations.  This  fixed  point  solution  tends  to  very  accurately  recover  the  support  of  the 
Dantzig  selector,  but  allows  errors  in  the  magnitude  of  the  nonzero  entries.  The  second 
stage,  a  postprocessing  step,  corrects  this  issue  by  regressing  the  observed  data  onto  the 
support  of  the  fixed  point  solution. 

In  further  detail,  the  optimization  problem  (1)  can  be  expressed  as 

P  e  argmin  {||/3 ||i  +  lc{AP)}  ,  (2) 

0 
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where  A  =  D  1XTX ,  b  =  D  1XTy,  C  =  {{3  :  \\/3  —  b\\ <*,  <  5}  and  ic(-)  is  an  indicator 
function  on  the  set  C.  Solutions  to  Equation  (2)  can  be  characterized  by  /3  and  r  satisfying 


U  =  proxjii.iiq/S-^r), 

\t  =  (1  -  proxj  (Ap  +  t)  , 

where,  for  a  function  /  with  parameter  A,  the  proximity  operator  is  defined  by 


proxA/(x)  :=  argmin  <  —  \\u  -  x\\2  +  f(u)  \  . 

The  iterative  method  developed  as  part  of  this  research  effort  approximates  a  solution 
of  (3)  as  the  fixed  point  solution  of 

f  rk+1  =  prox5|M|l  (A(2f3k  -  /3*-1)  +  rk  -  b)  , 

1  /3k+1  =  proxiy.i^  (, 6k  -  £ ATrk+1 )  , 


for  k  =  1,2,3....  The  fixed  point  solution  of  (4)  can  be  solved  straightforwardly  using  a  soft 
thresholding  operator.  The  overall  complexity  of  this  iterative  approach  is  O (rip) ,  where  n 
is  the  dimension  of  the  observation  y  and  p  is  the  dimension  of  the  Dantzig  selector  j3. 

The  above  method  was  compared  to  the  popular  Alternating  Direction  Method  (ADM) 
for  hireling  the  Dantzig  selector.  We  found  that  while  the  accuracy  of  the  approximated  solu¬ 
tions  was  nearly  equal  for  the  two  approaches,  our  method  was  significantly  faster.  Notably, 
the  difference  in  CPU  runtime  for  the  two  approaches  grew  larger  as  the  dimensionality 
increased  and  also  as  original  data  was  corrupted  by  more  noise. 

To  demonstrate  the  strength  of  the  proposed  method,  Drs.  Prater,  Shen  and  Suter  used 
a  large,  heterogeneous  data  set  of  biomarker  data  and  employed  the  Dantzig  selector  as  a 
classifier  to  predict  whether  a  patient  may  have  a  future  leukemia  diagnosis.  The  dataset 
included  numerical  data  for  over  7000  biomarkers.  It  is  likely  that  only  a  small  number  of 
genes  will  contribute  to  the  likelihood  of  a  patient  developing  leukemia  in  the  future,  but 
it  is  a  difficult  problem  for  even  a  medically  trained  individual  to  identify  the  contributing 
genes  from  the  huge  amount  of  data.  We  found  that  the  Dantzig  selector  performed  well  in 
determining  the  the  small  number  of  biomarkers  from  this  dataset  that  contribute  most  to 
a  patient  developing  leukemia.  The  Dantzig  selector  was  used  in  the  following  manner.  To 
train  the  output  classes,  we  used  a  simple  supervised  machine  learning  approach.  A  portion 
of  the  dataset  was  designated  as  ground  truth,  and  the  corresponding  values  of  the  observed 
data  vector  y  were  set  equal  to  1  if  a  patient  had  a  leukemia  diagnosis  and  0  otherwise. 
The  trained  value  of  the  Dantzig  selector  f3  was  then  used  with  the  unclassified  test  data 
to  compute  the  output  observation  ytest •  Finally,  the  diagnosis  of  the  patients  in  the  test 
category  were  predicted  using  a  clustering  method  with  two  classes. 

The  method  proposed  above  took  on  average  less  than  0.01  seconds  for  most  parameter 
selections  and  yielded  only  one  misdiagnosed  patient  out  of  204  trials.  In  comparison,  the 
ADM  required  more  than  100  seconds  on  average  and  misdiagnosed  7  patients  out  of  the 
204  trials. 

Futher  details  on  the  discussion  above  can  be  found  in  [6]. 
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4.2  Scheme  to  Separate  and  Classify  Composite  Data 

After  developing  the  algorithm  described  above  to  approximate  the  Dantzig  selector,  at¬ 
tention  was  turned  to  extending  both  the  model  and  the  algorithm  to  accept  more  general 
types  of  signals.  The  Dantzig  selector  model  in  Equation  (1)  yields  a  sparse  solution  f3  only 
for  observations  y  admitting  such  a  representation.  This  is  a  rather  restrictive  class  of  sig¬ 
nals.  Instead,  we  looked  to  incorporate  into  the  model  not  only  a  representation  basis,  but 
overcomplete  dictionaries  so  one  could  use  the  model  to  analyze  a  rich  class  of  images  and 
signals. 

To  this  end,  suppose  that  c  is  the  signal  or  image  one  wishes  to  analyze,  and  that  it  is 
comprised  of  several  other  atomic  signals.  Say,  c  =  C\  +  c2  +  •  •  •  +  c*,,  where  each  individual 
component  c3  is  unknown.  In  applications  it  is  unlikely  that  the  individual  components  are 
sparse,  but  each  one  could  admit  a  sparse  representation  c3  =  Bj/3j ,  where  / 3j  is  a  sparse 
vector  and  B3  is  a  basis  or  dictionary.  The  Bjs  could  possibly  coincide.  In  practice  one  does 
not  know  a  priori  the  sparse  vectors  (3j,  but  typically  one  knows  a  good  sparsifying  basis  B3. 
Given  the  observation  y  =  Xc,  where  X  is  defined  as  in  (1),  one  can  incorporate  these  bases 
into  the  model  as 

P  =  argmin  {II/5H!  :  \\D~1BtXt  {XB(3  -  y)^  <  5}  ,  (5) 

with  the  matrix  B  equal  to  the  concatenation  of  the  bases  and  the  vector  equal  to  the 
concatenations  of  the  sparse  vectors  f3j. 

The  proximity  operator  fixed  point  based  algorithm  described  above  can  be  easily  ex¬ 
tended  to  include  these  overcomplete  dictionaries  by  redefining  A,  b  and  C  as 

A  =  D~1BtXtXB,  b  =  D~1BTXTy,  and  C  =  {/3  :  \\b  -  ^  <  5}. 

In  [7],  several  numerical  experiments  illustrate  the  effectiveness  of  the  scheme  incorpo¬ 
rating  the  overcomplete  dictionaries  into  the  Dantzig  selector  model.  One  experiment  in 
particular  is  interesting  in  that  it  is  paired  with  supervised  machine  learning  techniques  to 
perform  separation  and  classification  of  images  using  the  principal  components  of  the  Dantzig 
selectors  of  the  components.  In  the  example  compositions  of  two  handwritten  digits  arsepa- 
rated  and  classified.  The  handwritten  digits  are  taken  from  the  United  States  Postal  Service 
data  set  [10],  which  was  then  split  into  a  training  set  and  a  testing  set.  The  overcomplete 
dictionaries  were  then  formed  using  the  principal  components  from  the  labeled  examples 
from  each  class  included  in  the  training  set.  That  is,  each  B3  was  formed  by  the  first  k 
principal  components  of  the  collection  Rj,  where  Rj  was  all  training  images  in  the  jth  class. 
Supposing  that  c  was  a  composition  of  two  unlabeled  digits  taken  from  the  testing  dataset, 
the  Dantzig  selector  of  c  was  computed  from  Equation  (5).  The  unknown  components  of  c 
can  then  be  immediately  recovered  and  classified  through  the  Dantzig  selector,  based  on  the 
the  support  of  0.  An  illustration  of  this  method  is  shown  in  Figure  1. 

Further  details  on  the  discussion  above  can  be  found  in  [7]. 

4.3  Method  to  Perform  Rotational  Invariant  Pattern  Recognition 

Under  this  research  effort,  a  mathematically  rigorous  method  to  perform  pattern  recognition 
in  noisy,  possibly  rotated  images  was  developed  that  computes  feature  vectors  using  a  sparse 
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Original  Image 


Actual  Digit  1 


Actual  Digit  2 


Recovered  Digit  1  Recovered  Digit  2 


Figure  1:  A  composition  of  unknown  handwritten  digits  separated  using  the  Dantzig  selector 
with  trained  overcomplete  dictionaries. 


representation  of  the  data.  The  approach  was  to  write  the  unclassified  two  dimensional 
image  in  terms  of  the  bivariate  Hermite  polynomials,  say 

C(ni,n2)H{ni,n2)(x,y),  (6) 

(ni,n2)eW 

where  /  describes  the  image,  if(nijn2)  is  the  (711,712)  —  th  Hermite  polynomial,  C(ni.„2)  are  real 
valued  coefficients,  and  W  is  an  appropriately  chosen  index  set.  The  Hermite  polynomials 
are  used  in  the  Fourier-like  expansion  (6)  because  certain  combinations  of  their  moments 
are  rotation  invariant.  The  (711,712)  geometric  Gaussian- Hermite  moment  of  the  image  /  is 
defined  by 

m(ni,n2)  ■=  If  f{x,y)H{nun2)(x,y)e~(x2+y2)/2  dx  dy.  (7) 

J  J  R2 

In  [8],  it  is  shown  that  for  carefully  chosen  W,  the  (711,712)  —  th  geometric  Gaussian-Hermite 
moment  of  /  can  be  well  approximated  by  the  (711,712)  —  th  coefficient  appearing  in  the 
expansion  (6). 

Computing  the  coefficients  appearing  in  (6)  is  nontrivial.  Each  one  is  defined  by  a  highly 
oscillatory  bivariate  integral  for  which  closed-form  solutions  exist  in  only  rare  cases.  Each 
is  difficult  to  compute  either  directly  or  using  quadrature  methods,  and  one  must  perform 
this  approximation  as  many  times  as  the  cardinality  of  the  index  set  W.  To  quickly  and 
accurately  approximate  the  coefficients,  and  therefore  also  the  geometric  Gaussian-Hermite 
moments  of  an  image,  Dr.  Prater  proposed  in  [8]  using  a  sparse  collocation-based  approach. 
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Suppose  X  is  the  Jacobi- like  matrix  defined  by  X(j,  k )  =  Hk(xj),  with  the  univariate  Hermite 
polynomial  Hk  and  predetermined  nodes  A  =  {op,  £2,  •  •  •  ,xm}.  Then  one  can  approximate 
the  entire  collection  of  coefficients  {c(nijn2)}  appearing  in  (6)  by  solving  the  optimization 
problem 

{minimize  ||c||i 

subject  to  || D~1Xt(Xc  —  /)||oo  <  5. 

Suppose  c  is  the  solution  to  (8).  One  can  approximate  the  rotation  invariants  of  the 
Gaussian-Hermite  moments  from  c  using  the  equations  documented  in  [8,  9].  The  first  few 
rotation  invariants  of  the  Gaussian-Hermite  moments  are  given  by: 


'-(2,0)  T 

=  (C(3,0)  +  C(l,2))  +  (5(0,3)  +  C(2,l))  , 

=  (C(2,0)  —  GO, 2))  (C(3,0)  +  C(i,2))  —  (c(0,3)  +  C(2,l))  +  4C(l,l)  (c(3,0)  +  C(l,2))  (c(0,3)  +  C(2,l))  • 


To  perform  rotational  invariant  pattern  recognition,  we  clasify  images  according  to  how 
closely  the  collection  of  the  rotation  invariants  match  those  of  labeled  test  images.  That  is, 
suppose  {$1,  $2, . . . ,  }  are  vectors  of  rotation  invariants  of  the  Gaussian-Hermite  moments  of 
the  labeled  images  {Fj,  F2, . . and  let  $  be  the  rotation  invariants  of  the  Gaussian-Hermite 
moments  of  the  unclassified  image  /  computed  using  method  (8).  Then  classify  the  image 
/  as  a  rotation  of  image  Fj  if 


114,  -411,  <  Vfc. 


(9) 


The  method  described  above  is  computationally  superior  to  more  direct  ‘brute-force’  style 
methods.  The  direct  method  would  have  several  rotations  of  each  example  images  included  in 
the  training  data  set,  resulting  in  more  differences  and  comparisons  to  make  in  Equation  (9). 

For  more  details  on  the  above,  including  numerical  experiments  using  real-world  and 
simulated  noise-free  and  noisy  data,  see  [8].  The  above  work  demonstrated  this  research  can 
be  an  effective  pre-processing  step  and  classification  strategy  for  certain  machine  learning 
tasks.  This  research  sub-project  will  continue  to  be  studied  in  the  Neuromorphic  Computing 
group  at  AFRL/RI. 


5  Conclusions 

Throughout  the  research  project,  models  and  algorithms  were  explored  and  developed  that 
can  be  used  to  perform  the  supervised  classification  and  pattern  recognition  in  large,  hetero¬ 
geneous  data  sets.  The  research  was  broad  in  scope  and  has  direction  applicability  in  several 
data  domains,  including  the  previously  stated  area  of  recognition  vehicles  from  several  data 
sensor  products.  The  PI  will  continue  to  pursue  these  directions  in  new  research  efforts. 
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A  List  of  Acronyms 


ADM 

Alternating  Direction  Method 

AFRL 

Air  Force  Research  Laboritory 

CCD 

Coherent  Change  Detection 

CPU 

Central  Processing  Unit 

CTC 

Core  Technical  Competencies 

DoD 

Department  of  Defense 

GMTI 

Ground  Moving  Target  Indicator 

ISR 

Intelligence,  Surveillance  and  Reconnaissance 

NCD 

Noncoherent  Change  Detection 

OCR 

Optical  Character  Recognition 

PCA 

Principal  Component  Analysis 

SAR 

Synthetic  Aperture  Radar 

SVD 

Singular  Value  Decomponsition 

USPS 

United  States  Postal  Service 
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